Overview

Dataset statistics

Number of variables70
Number of observations26648
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.3 MiB
Average record size in memory171.0 B

Variable types

BOOL55
CAT8
NUM7

Warnings

flight_code has a high cardinality: 1313 distinct values High cardinality
arrival_time has a high cardinality: 906 distinct values High cardinality
departure_time_day is highly correlated with departure_time and 1 other fieldsHigh correlation
departure_time is highly correlated with departure_time_dayHigh correlation
departure_date is highly correlated with departure_time_dayHigh correlation
number_of_stops has 5301 (19.9%) zeros Zeros
departure_day has 6957 (26.1%) zeros Zeros
booking_day has 3776 (14.2%) zeros Zeros

Reproduction

Analysis started2020-10-07 12:45:31.860970
Analysis finished2020-10-07 12:46:31.231637
Duration59.37 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

Distinct5730
Distinct (%)21.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2387.683166
Minimum0
Maximum5783
Zeros7
Zeros (%)< 0.1%
Memory size208.2 KiB

Quantile statistics

Minimum0
5-th percentile230
Q11147
median2284.5
Q33431
95-th percentile5120
Maximum5783
Range5783
Interquartile range (IQR)2284

Descriptive statistics

Standard deviation1487.008623
Coefficient of variation (CV)0.622783058
Kurtosis-0.8135551969
Mean2387.683166
Median Absolute Deviation (MAD)1141.5
Skewness0.3230022456
Sum63626981
Variance2211194.646
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
20237< 0.1%
 
33217< 0.1%
 
25527< 0.1%
 
5057< 0.1%
 
5377< 0.1%
 
26167< 0.1%
 
26327< 0.1%
 
5857< 0.1%
 
6017< 0.1%
 
7137< 0.1%
 
Other values (5720)2657899.7%
 
ValueCountFrequency (%) 
07< 0.1%
 
17< 0.1%
 
27< 0.1%
 
37< 0.1%
 
47< 0.1%
 
ValueCountFrequency (%) 
57831< 0.1%
 
57821< 0.1%
 
57811< 0.1%
 
57801< 0.1%
 
57791< 0.1%
 

Air India
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22751 
1
3897 
ValueCountFrequency (%) 
02275185.4%
 
1389714.6%
 

AirAsia
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24976 
1
 
1672
ValueCountFrequency (%) 
02497693.7%
 
116726.3%
 

Go Air
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25297 
1
 
1351
ValueCountFrequency (%) 
02529794.9%
 
113515.1%
 

IndiGo
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
16454 
1
10194 
ValueCountFrequency (%) 
01645461.7%
 
11019438.3%
 

Spicejet
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
20817 
1
5831 
ValueCountFrequency (%) 
02081778.1%
 
1583121.9%
 

Vistara
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22945 
1
3703 
ValueCountFrequency (%) 
02294586.1%
 
1370313.9%
 

airline
Categorical

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
IndiGo
10194 
Spicejet
5831 
Air India
3897 
Vistara
3703 
AirAsia
1672 
ValueCountFrequency (%) 
IndiGo1019438.3%
 
Spicejet583121.9%
 
Air India389714.6%
 
Vistara370313.9%
 
AirAsia16726.3%
 
Go Air13515.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.078054638
Min length6

flight_code
Categorical

HIGH CARDINALITY

Distinct1313
Distinct (%)4.9%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
SG-789
 
168
UK-944 | UK-813
 
56
SG-8152 | SG-8705
 
56
SG-8152 | SG-3724
 
56
6E-128
 
56
Other values (1308)
26256 
ValueCountFrequency (%) 
SG-7891680.6%
 
UK-944 | UK-813560.2%
 
SG-8152 | SG-8705560.2%
 
SG-8152 | SG-3724560.2%
 
6E-128560.2%
 
6E-6328 | 6E-167560.2%
 
6E-427 | 6E-2405560.2%
 
6E-6834 | 6E-322560.2%
 
AI-866560.2%
 
6E-426 | 6E-582560.2%
 
Other values (1303)2597697.5%
 
Frequencies of value counts

Unique

Unique128 ?
Unique (%)0.5%
Histogram of lengths of the category

Length

Max length37
Median length15
Mean length14.89969228
Min length6

departure_time
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
morning
14568 
afternoon
7556 
evening
4106 
night
 
418
ValueCountFrequency (%) 
morning1456854.7%
 
afternoon755628.4%
 
evening410615.4%
 
night4181.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length9
Median length7
Mean length7.535725008
Min length5

flight_duration
Real number (ℝ≥0)

Distinct274
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean494.9568448
Minimum0
Maximum1435
Zeros40
Zeros (%)0.2%
Memory size208.2 KiB

Quantile statistics

Minimum0
5-th percentile115
Q1205
median425
Q3715
95-th percentile1135
Maximum1435
Range1435
Interquartile range (IQR)510

Descriptive statistics

Standard deviation324.9110891
Coefficient of variation (CV)0.6564432688
Kurtosis-0.08552123497
Mean494.9568448
Median Absolute Deviation (MAD)245
Skewness0.7614929635
Sum13189610
Variance105567.2158
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
12510614.0%
 
1309623.6%
 
1358653.2%
 
4204641.7%
 
1654641.7%
 
1704341.6%
 
3304281.6%
 
4253311.2%
 
4103111.2%
 
6352971.1%
 
Other values (264)2103178.9%
 
ValueCountFrequency (%) 
0400.2%
 
5240.1%
 
10530.2%
 
156< 0.1%
 
20260.1%
 
ValueCountFrequency (%) 
1435140.1%
 
1430550.2%
 
14259< 0.1%
 
1405260.1%
 
1400620.2%
 

arrival_time
Categorical

HIGH CARDINALITY

Distinct906
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
20:30
 
509
22:35
 
483
20:15
 
480
20:40
 
420
19:15
 
382
Other values (901)
24374 
ValueCountFrequency (%) 
20:305091.9%
 
22:354831.8%
 
20:154801.8%
 
20:404201.6%
 
19:153821.4%
 
20:503391.3%
 
17:303191.2%
 
22:452901.1%
 
20:452741.0%
 
18:202731.0%
 
Other values (896)2287985.9%
 
Frequencies of value counts

Unique

Unique57 ?
Unique (%)0.2%
Histogram of lengths of the category

Length

Max length41
Median length5
Mean length13.00682978
Min length5

flight_cost
Real number (ℝ≥0)

Distinct1060
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5151.78974
Minimum2540
Maximum20674
Zeros0
Zeros (%)0.0%
Memory size208.2 KiB

Quantile statistics

Minimum2540
5-th percentile3323
Q13797
median4680
Q35910
95-th percentile8731
Maximum20674
Range18134
Interquartile range (IQR)2113

Descriptive statistics

Standard deviation1869.544629
Coefficient of variation (CV)0.3628922613
Kurtosis5.198739453
Mean5151.78974
Median Absolute Deviation (MAD)883
Skewness1.906065332
Sum137284893
Variance3495197.118
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
379717706.6%
 
374614335.4%
 
395512324.6%
 
348111194.2%
 
32718903.3%
 
51328723.3%
 
40067152.7%
 
47126352.4%
 
39565602.1%
 
49224241.6%
 
Other values (1050)1699863.8%
 
ValueCountFrequency (%) 
25403< 0.1%
 
2955530.2%
 
2956640.2%
 
29572020.8%
 
30617< 0.1%
 
ValueCountFrequency (%) 
206744< 0.1%
 
206661< 0.1%
 
188661< 0.1%
 
184861< 0.1%
 
184451< 0.1%
 

number_of_stops
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9828129691
Minimum0
Maximum5
Zeros5301
Zeros (%)19.9%
Memory size208.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median1
Q31
95-th percentile2
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.6768069427
Coefficient of variation (CV)0.6886426654
Kurtosis1.941358959
Mean0.9828129691
Median Absolute Deviation (MAD)0
Skewness0.8233686836
Sum26190
Variance0.4580676376
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%) 
11747065.6%
 
0530119.9%
 
2297911.2%
 
38313.1%
 
4660.2%
 
51< 0.1%
 
ValueCountFrequency (%) 
0530119.9%
 
11747065.6%
 
2297911.2%
 
38313.1%
 
4660.2%
 
ValueCountFrequency (%) 
51< 0.1%
 
4660.2%
 
38313.1%
 
2297911.2%
 
11747065.6%
 

departure_date
Categorical

HIGH CORRELATION

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
2020-10-12
3718 
2020-10-11
3553 
2020-10-07
3454 
2020-10-09
3445 
2020-10-10
3280 
Other values (3)
9198 
ValueCountFrequency (%) 
2020-10-12371814.0%
 
2020-10-11355313.3%
 
2020-10-07345413.0%
 
2020-10-09344512.9%
 
2020-10-10328012.3%
 
2020-10-05323912.2%
 
2020-10-08313311.8%
 
2020-10-06282610.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

departure_day
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.650517863
Minimum0
Maximum6
Zeros6957
Zeros (%)26.1%
Memory size104.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median3
Q35
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)5

Descriptive statistics

Standard deviation2.145478968
Coefficient of variation (CV)0.8094565212
Kurtosis-1.3641934
Mean2.650517863
Median Absolute Deviation (MAD)2
Skewness0.1575318788
Sum70631
Variance4.603080004
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
0695726.1%
 
6355313.3%
 
2345413.0%
 
4344512.9%
 
5328012.3%
 
3313311.8%
 
1282610.6%
 
ValueCountFrequency (%) 
0695726.1%
 
1282610.6%
 
2345413.0%
 
3313311.8%
 
4344512.9%
 
ValueCountFrequency (%) 
6355313.3%
 
5328012.3%
 
4344512.9%
 
3313311.8%
 
2345413.0%
 

booking_date
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
2020-10-04
4054 
2020-10-02
4044 
2020-10-03
4037 
2020-09-28
3776 
2020-10-01
3705 
Other values (2)
7032 
ValueCountFrequency (%) 
2020-10-04405415.2%
 
2020-10-02404415.2%
 
2020-10-03403715.1%
 
2020-09-28377614.2%
 
2020-10-01370513.9%
 
2020-09-29351813.2%
 
2020-09-30351413.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length10
Min length10

booking_day
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.090138097
Minimum0
Maximum6
Zeros3776
Zeros (%)14.2%
Memory size208.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q35
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.013497147
Coefficient of variation (CV)0.6515880793
Kurtosis-1.256603876
Mean3.090138097
Median Absolute Deviation (MAD)2
Skewness-0.07836505984
Sum82346
Variance4.054170762
MonotocityIncreasing
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
6405415.2%
 
4404415.2%
 
5403715.1%
 
0377614.2%
 
3370513.9%
 
1351813.2%
 
2351413.2%
 
ValueCountFrequency (%) 
0377614.2%
 
1351813.2%
 
2351413.2%
 
3370513.9%
 
4404415.2%
 
ValueCountFrequency (%) 
6405415.2%
 
5403715.1%
 
4404415.2%
 
3370513.9%
 
2351413.2%
 

days_to_depart
Real number (ℝ≥0)

Distinct14
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.537038427
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Memory size208.2 KiB

Quantile statistics

Minimum1
5-th percentile2
Q15
median8
Q310
95-th percentile13
Maximum14
Range13
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.030074316
Coefficient of variation (CV)0.402024528
Kurtosis-0.6183467638
Mean7.537038427
Median Absolute Deviation (MAD)2
Skewness-0.0108659168
Sum200847
Variance9.181350361
MonotocityNot monotonic
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%) 
8343712.9%
 
7335012.6%
 
9289910.9%
 
6271410.2%
 
1024609.2%
 
523768.9%
 
1118887.1%
 
418607.0%
 
314465.4%
 
1214065.3%
 
Other values (4)281210.6%
 
ValueCountFrequency (%) 
14541.7%
 
29073.4%
 
314465.4%
 
418607.0%
 
523768.9%
 
ValueCountFrequency (%) 
144971.9%
 
139543.6%
 
1214065.3%
 
1118887.1%
 
1024609.2%
 

flight_path
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
Bengaluru-New Delhi
5780 
Mumbai-New Delhi
5184 
New Delhi-Mumbai
4535 
Mumbai-Bengaluru
4360 
New Delhi-Kolkata
4092 
Other values (2)
2697 
ValueCountFrequency (%) 
Bengaluru-New Delhi578021.7%
 
Mumbai-New Delhi518419.5%
 
New Delhi-Mumbai453517.0%
 
Mumbai-Bengaluru436016.4%
 
New Delhi-Kolkata409215.4%
 
New Delhi-Goa20007.5%
 
Mumbai-Goa6972.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length19
Median length16
Mean length16.42217052
Min length10
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
20868 
1
5780 
ValueCountFrequency (%) 
02086878.3%
 
1578021.7%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22288 
1
4360 
ValueCountFrequency (%) 
02228883.6%
 
1436016.4%
 

Mumbai-Goa
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25951 
1
 
697
ValueCountFrequency (%) 
02595197.4%
 
16972.6%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
21464 
1
5184 
ValueCountFrequency (%) 
02146480.5%
 
1518419.5%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24648 
1
 
2000
ValueCountFrequency (%) 
02464892.5%
 
120007.5%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22556 
1
4092 
ValueCountFrequency (%) 
02255684.6%
 
1409215.4%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22113 
1
4535 
ValueCountFrequency (%) 
02211383.0%
 
1453517.0%
 

bd__0
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22872 
1
3776 
ValueCountFrequency (%) 
02287285.8%
 
1377614.2%
 

bd__1
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23130 
1
3518 
ValueCountFrequency (%) 
02313086.8%
 
1351813.2%
 

bd__2
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23134 
1
3514 
ValueCountFrequency (%) 
02313486.8%
 
1351413.2%
 

bd__3
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22943 
1
3705 
ValueCountFrequency (%) 
02294386.1%
 
1370513.9%
 

bd__4
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22604 
1
4044 
ValueCountFrequency (%) 
02260484.8%
 
1404415.2%
 

bd__5
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22611 
1
4037 
ValueCountFrequency (%) 
02261184.9%
 
1403715.1%
 

bd__6
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22594 
1
4054 
ValueCountFrequency (%) 
02259484.8%
 
1405415.2%
 

dd__0
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
19691 
1
6957 
ValueCountFrequency (%) 
01969173.9%
 
1695726.1%
 

dd__1
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23822 
1
2826 
ValueCountFrequency (%) 
02382289.4%
 
1282610.6%
 

dd__2
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23194 
1
3454 
ValueCountFrequency (%) 
02319487.0%
 
1345413.0%
 

dd__3
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23515 
1
3133 
ValueCountFrequency (%) 
02351588.2%
 
1313311.8%
 

dd__4
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23203 
1
3445 
ValueCountFrequency (%) 
02320387.1%
 
1344512.9%
 

dd__5
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23368 
1
3280 
ValueCountFrequency (%) 
02336887.7%
 
1328012.3%
 

dd__6
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
23095 
1
3553 
ValueCountFrequency (%) 
02309586.7%
 
1355313.3%
 

departure_time_day
Categorical

HIGH CORRELATION

Distinct28
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size208.2 KiB
morning-0
3903 
morning-6
1935 
afternoon-0
1934 
morning-4
1884 
morning-5
1828 
Other values (23)
15164 
ValueCountFrequency (%) 
morning-0390314.6%
 
morning-619357.3%
 
afternoon-019347.3%
 
morning-418847.1%
 
morning-518286.9%
 
morning-217996.8%
 
morning-316866.3%
 
morning-115335.8%
 
afternoon-210694.0%
 
evening-010093.8%
 
Other values (18)806830.3%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length11
Median length9
Mean length9.535725008
Min length7
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24714 
1
 
1934
ValueCountFrequency (%) 
02471492.7%
 
119347.3%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25879 
1
 
769
ValueCountFrequency (%) 
02587997.1%
 
17692.9%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25579 
1
 
1069
ValueCountFrequency (%) 
02557996.0%
 
110694.0%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25752 
1
 
896
ValueCountFrequency (%) 
02575296.6%
 
18963.4%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25686 
1
 
962
ValueCountFrequency (%) 
02568696.4%
 
19623.6%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25729 
1
 
919
ValueCountFrequency (%) 
02572996.6%
 
19193.4%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25641 
1
 
1007
ValueCountFrequency (%) 
02564196.2%
 
110073.8%
 

evening-0
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25639 
1
 
1009
ValueCountFrequency (%) 
02563996.2%
 
110093.8%
 

evening-1
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26177 
1
 
471
ValueCountFrequency (%) 
02617798.2%
 
14711.8%
 

evening-2
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26119 
1
 
529
ValueCountFrequency (%) 
02611998.0%
 
15292.0%
 

evening-3
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26143 
1
 
505
ValueCountFrequency (%) 
02614398.1%
 
15051.9%
 

evening-4
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26098 
1
 
550
ValueCountFrequency (%) 
02609897.9%
 
15502.1%
 

evening-5
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26165 
1
 
483
ValueCountFrequency (%) 
02616598.2%
 
14831.8%
 

evening-6
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26089 
1
 
559
ValueCountFrequency (%) 
02608997.9%
 
15592.1%
 

morning-0
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
22745 
1
3903 
ValueCountFrequency (%) 
02274585.4%
 
1390314.6%
 

morning-1
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
25115 
1
 
1533
ValueCountFrequency (%) 
02511594.2%
 
115335.8%
 

morning-2
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24849 
1
 
1799
ValueCountFrequency (%) 
02484993.2%
 
117996.8%
 

morning-3
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24962 
1
 
1686
ValueCountFrequency (%) 
02496293.7%
 
116866.3%
 

morning-4
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24764 
1
 
1884
ValueCountFrequency (%) 
02476492.9%
 
118847.1%
 

morning-5
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24820 
1
 
1828
ValueCountFrequency (%) 
02482093.1%
 
118286.9%
 

morning-6
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
24713 
1
 
1935
ValueCountFrequency (%) 
02471392.7%
 
119357.3%
 

night-0
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26537 
1
 
111
ValueCountFrequency (%) 
02653799.6%
 
11110.4%
 

night-1
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26595 
1
 
53
ValueCountFrequency (%) 
02659599.8%
 
1530.2%
 

night-2
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26591 
1
 
57
ValueCountFrequency (%) 
02659199.8%
 
1570.2%
 

night-3
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26602 
1
 
46
ValueCountFrequency (%) 
02660299.8%
 
1460.2%
 

night-4
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26599 
1
 
49
ValueCountFrequency (%) 
02659999.8%
 
1490.2%
 

night-5
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26598 
1
 
50
ValueCountFrequency (%) 
02659899.8%
 
1500.2%
 

night-6
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size26.0 KiB
0
26596 
1
 
52
ValueCountFrequency (%) 
02659699.8%
 
1520.2%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

df_indexAir IndiaAirAsiaGo AirIndiGoSpicejetVistaraairlineflight_codedeparture_timeflight_durationarrival_timeflight_costnumber_of_stopsdeparture_datedeparture_daybooking_datebooking_daydays_to_departflight_pathBengaluru-New DelhiMumbai-BengaluruMumbai-GoaMumbai-New DelhiNew Delhi-GoaNew Delhi-KolkataNew Delhi-Mumbaibd__0bd__1bd__2bd__3bd__4bd__5bd__6dd__0dd__1dd__2dd__3dd__4dd__5dd__6departure_time_dayafternoon-0afternoon-1afternoon-2afternoon-3afternoon-4afternoon-5afternoon-6evening-0evening-1evening-2evening-3evening-4evening-5evening-6morning-0morning-1morning-2morning-3morning-4morning-5morning-6night-0night-1night-2night-3night-4night-5night-6
00010000AirAsiaI5-302 | I5-741morning540.014:50350012020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000morning-00000000000000010000000000000
11010000AirAsiaI5-317 | I5-979morning660.021:45350012020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000morning-00000000000000010000000000000
22010000AirAsiaI5-984 | I5-741night860.013:45+1 DAYArrival : New Delhi06 Oct 2020386812020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000night-00000000000000000000001000000
33001000Go AirG8-306 | G8-175evening760.010:30+1 DAYArrival : New Delhi06 Oct 2020350012020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000evening-00000000100000000000000000000
44001000Go AirG8-322evening125.023:40350002020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000evening-00000000100000000000000000000
55001000Go AirG8-327morning140.008:05386802020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000morning-00000000000000010000000000000
66000001VistaraUK-960morning125.013:20350002020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000morning-00000000000000010000000000000
77000001VistaraUK-944afternoon125.015:40350002020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000afternoon-01000000000000000000000000000
88000010SpicejetSG-790afternoon125.017:20386802020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000afternoon-01000000000000000000000000000
99000001VistaraUK-940evening125.021:45350102020-10-0502020-09-2807Mumbai-New Delhi000100010000001000000evening-00000000100000000000000000000

Last rows

df_indexAir IndiaAirAsiaGo AirIndiGoSpicejetVistaraairlineflight_codedeparture_timeflight_durationarrival_timeflight_costnumber_of_stopsdeparture_datedeparture_daybooking_datebooking_daydays_to_departflight_pathBengaluru-New DelhiMumbai-BengaluruMumbai-GoaMumbai-New DelhiNew Delhi-GoaNew Delhi-KolkataNew Delhi-Mumbaibd__0bd__1bd__2bd__3bd__4bd__5bd__6dd__0dd__1dd__2dd__3dd__4dd__5dd__6departure_time_dayafternoon-0afternoon-1afternoon-2afternoon-3afternoon-4afternoon-5afternoon-6evening-0evening-1evening-2evening-3evening-4evening-5evening-6morning-0morning-1morning-2morning-3morning-4morning-5morning-6night-0night-1night-2night-3night-4night-5night-6
266385773100000Air IndiaAI-544 | AI-525afternoon940.007:55+1 DAYArrival : Kolkata13 Oct 2020943512020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000afternoon-01000000000000000000000000000
266395774000010SpicejetSG-8169 | SG-241evening760.008:30+1 DAYArrival : Kolkata13 Oct 20201168712020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000evening-00000000100000000000000000000
266405775000010SpicejetSG-8169 | SG-487evening160.022:30+1 DAYArrival : Kolkata13 Oct 2020474012020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000evening-00000000100000000000000000000
266415777000100IndiGo6E-2192 | 6E-6292afternoon370.020:15943512020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000afternoon-01000000000000000000000000000
266425778000100IndiGo6E-277 | 6E-6292morning630.020:151168712020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000morning-00000000000000010000000000000
266435779000100IndiGo6E-251 | 6E-349morning410.011:50474012020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000morning-00000000000000010000000000000
266445780100000Air IndiaAI-504 | AI-776evening780.010:40+1 DAYArrival : Kolkata13 Oct 2020474012020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000evening-00000000100000000000000000000
266455781000001VistaraUK-995 | UK-775morning450.016:55891212020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000morning-00000000000000010000000000000
266465782000001VistaraUK-943 | UK-775morning550.016:551117612020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000morning-00000000000000010000000000000
266475783100000Air IndiaAI-851 | AI-571 | AI-525night190.007:55+1 DAYArrival : Kolkata13 Oct 2020474032020-10-1202020-10-0468New Delhi-Kolkata000001000000011000000night-00000000000000000000001000000